Validating Hyperparameters





Kerry Back

Hyperparameters

  • The max depth of trees in a forest and the number of trees are called hyperparameters.
  • Hyperparameter means that they are specified ex ante rather than calculated through fitting.
  • The hidden layer sizes in a neural net are also hyperparameters.

Overfitting

  • Hyperparameters control how complex the model is.
  • More complex models will better fit the training data.
  • But we risk overfitting.
    • Overfitting means fitting our model to random peculiarities of the training data.
    • An overfit model will not work well on new data.
  • So more complexity is not necessarily better.

Validation

  • To choose hyperparameters from the data and minimize the risk of overfitting, reserve some data called validation data.
  • Train with different hyperparameters on training data that does not include validation data.
  • Choose hyperparameters that perform best on validation data.

Cross validation

  • Split data into, for example, 3 sets of equal size, say A, B, and C.
  • Train on A \(\cup\) B, assess performance on C
  • Train on A \(\cup\) C, assess performance on B
  • Train on B \(\cup\) C, assess performance on A
  • Choose hyperparameters with best average performance on A, B, and C.

Grid Search CV

from sklearn.model_selection import GridSearchCV
  • Pass a model or pipeline to GridSearchCV without specifying the hyperparameters.
  • Pass a set (“grid”) of hyperparameters to evaluate.
  • Fit the GridSearchCV.

Everything in one step

Fitting GridSearchCV does all of the following:

  • Randomly choose the subsets A, B, and C (default is 5 subsets rather than 3).
  • Fit the model or pipeline on training sets and evaluate on validation sets.
  • Choose hyperparameters with best average performance.
  • Refit the model on the entire dataset using the best hyperparameters.

Random forest example

  • roeq, mom12m, and rnk for 2021-01 as before
  • Define model without specifying max depth.
model = RandomForestRegressor(
  random_state=0
)
  • Specify depths to consider.
param_grid = {
    "max_depth": [4, 6, 8]
}

Define and fit GridSearchCV

cv = GridSearchCV(
  model, 
  param_grid=param_grid
)
cv.fit(X, y)

GridSearchCV(estimator=RandomForestRegressor(random_state=0),
             param_grid={'max_depth': [4, 6, 8]})
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

R-squared

cv.score(X, y)
0.12891622261119873


Best depth

cv.best_params_
{'max_depth': 4}


Make a prediction

import numpy as np
x = np.array([.1, .4]).reshape(1,2)
cv.predict(x)
array([0.50320694])

Neural net example

transform = QuantileTransformer(
    output_distribution="normal"
)
model = MLPRegressor(
  random_state=0
)
pipe = make_pipeline(transform, model)
  • We pass the pipeline to GridSearchCV.

  • We have to specify what part of the pipeline that the hyperparameters belong to.
  • Name in lowercase. Double underscore between name and parameter name.
param_grid = {
    "mlpregressor__hidden_layer_sizes": 
    [(4, 2), (8, 4, 2), (16, 8, 4, 2)]
}

cv = GridSearchCV(
  pipe, 
  param_grid=param_grid
)
cv.fit(X, y)

GridSearchCV(estimator=Pipeline(steps=[('quantiletransformer',
                                        QuantileTransformer(output_distribution='normal')),
                                       ('mlpregressor',
                                        MLPRegressor(random_state=0))]),
             param_grid={'mlpregressor__hidden_layer_sizes': [(4, 2), (8, 4, 2),
                                                              (16, 8, 4, 2)]})
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

R-squared

cv.score(X, y)
0.06097467915277932


Best hidden layers

cv.best_params_
{'mlpregressor__hidden_layer_sizes': (4, 2)}


Make a prediction

import numpy as np
x = np.array([.1, .4]).reshape(1,2)
cv.predict(x)
array([0.4891965])